Quality-Configurable Memory Hierarchy Through Approximation

نویسندگان

  • Amir M. Rahmani
  • Nikil Dutt
  • Majid Shoushtari
چکیده

The memory subsystem is a major contributor to the overall performance and energy consumption of embedded computing platforms. The emergence of "killer" applications such as data-intensive recognition, mining, and synthesis (RMS) applications puts even more stress on the memory subsystem and exacerbates its energy consumption. Traditional mechanisms to ensure data integrity deploy overdesign (e.g., redundancy and error detection/correction) and/or guardbanding that consumes a signi cant part of the energy consumed in the memory subsystem. We explore opportunities for energy e ciency by exploiting the intrinsic tolerance of a vast class of approximate computing applications to some level of error in the on-chip memory hierarchy. We present two exemplars outlining the typical software and hardware mechanisms that are required for di erent components in the memory hierarchy, implemented in varying technologies such as SRAM and STT-MRAM.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ultra-Efficient Content Addressable Memory for Tunable GPU Approximation

In this paper, we describe a resistive configurable associative memory (ReCAM) that enables selective approximation and asymmetric voltage overscaling to manage delivered efficiency. The ReCAM structure matches an input pattern with pre-stored ones by applying an approximate search on selected bit indices (bitline-configurable). To further reduce energy, we explore proper ReCAM sizing, various ...

متن کامل

Use of an Embedded Configurable Memory for Stream Image Processing

We examine the use of the embedded Blackfin BF561 processor for high-definition image processing using the stream model of computing. The Blackfin features a configurable memory hierarchy that minimizes the Memory Wall effect. We describe the stream model and its application to the BF561 to utilize low-latency on-chip memory and compare to a worst-cast baseline using SDRAM only. We find a 2X to...

متن کامل

Simulation and Architectural Exploration of a Shared - Memory Multiprocessor Node for Scientific Algorithms

In this thesis, GEMS (a Generic Environment for Multiprocessor Simulations) is presented. GEMS is a simulation environment written in the Superlog language, which simulates a configurable shared-memory multiprocessor system. Simulation focuses on the memory hierarchy and the system interconnect. Part of GEMS is a directory-based cache coherence protocol. This protocol is an adaptation of the bu...

متن کامل

Chapter 6 TUNING CACHES TO APPLICATIONS FOR LOW - ENERGY EMBEDDED SYSTEMS

The power consumed by the memory hierarchy of a microprocessor can contribute to as much as 50% of the total microprocessor system power, and is thus a good candidate for power and energy optimizations. We discuss four methods for tuning a microprocessors’ cache subsystem to the needs of any executing application for low-energy embedded systems. We introduce onchip hardware implementing an effi...

متن کامل

Accelerating Blocked Matrix-Matrix Multiplication using a Software-Managed Memory Hierarchy with DMA

The optimization of matrix-matrix multiplication (MMM) performance has been well studied on general-purpose desktop and server processors. Classic solutions exploit common microarchitectural features including superscalar execution and the cache and TLB hierarchy to achieve near-peak performance. Typical digital signal processors (DSPs) do not have these features, and instead use in-order execu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017